Hannah Meyer
January 28, 2020
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 3.2.1 ✔ purrr 0.3.2
## ✔ tibble 2.1.3 ✔ dplyr 0.8.3
## ✔ tidyr 1.0.0 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 5.2.0
function_output |
what the function returns |
<- |
the assignment operator |
function |
name of the function |
argument1 |
first argument the function accepts |
something |
our specification to argument1 |
argument2 |
second argument the function accepts |
something_else |
our specification to argument2 |
function_output |
what the function returns |
<- |
the assignment operator |
function |
name of the function |
argument1 |
first argument the function accepts |
something |
our specification to argument1 |
argument2 |
second argument the function accepts |
something_else |
our specification to argument2 |
function_output |
coord |
<- |
the assignment operator |
function |
read_csv |
argument1 |
file |
something |
“data/2004_Science_Smith_data.csv” |
Data from Smith et al (2004): http://www.antigenic-cartography.org/)
## Parsed with column specification:
## cols(
## name = col_character(),
## year = col_double(),
## cluster = col_character(),
## type = col_character(),
## x.coordinate = col_double(),
## y.coordinate = col_double(),
## location = col_character(),
## lat = col_double(),
## lng = col_double()
## )
? to have a look at the documentation of read_csv.coord object by creating a new chunk, typing coord and executing the code chunk.read_csv represented in coord?? to have a look at the documentation of read_csv.read_delim {readr} R Documentation
Read a delimited file (including csv & tsv) into a tibble
Description
read_csv() and read_tsv() are special cases of the general read_delim().
They're useful for reading the most common types of flat file data, comma
separated values and tab separated values, respectively [...]
coord object by creating a new chunk, typing coord and executing the code chunk.## # A tibble: 322 x 9
## name year cluster type x.coordinate y.coordinate location lat lng
## <chr> <dbl> <chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 BI/1… 1968 HK68 AG 4.05 15.0 BILTHOV… 52.1 5.02
## 2 BI/1… 1968 HK68 AG 4.10 14.8 BILTHOV… 52.1 5.02
## 3 BI/1… 1968 HK68 AG 4.36 13.9 BILTHOV… 52.1 5.02
## 4 BI/8… 1969 HK68 AG 3.87 14.3 BILTHOV… 52.1 5.02
## 5 BI/9… 1969 HK68 AG 4.87 14.1 BILTHOV… 52.1 5.02
## 6 BI/1… 1969 HK68 AG 4.40 14.9 BILTHOV… 52.1 5.02
## 7 BI/9… 1970 HK68 AG 5.06 14.5 BILTHOV… 52.1 5.02
## 8 BI/2… 1970 HK68 AG 4.82 15.5 BILTHOV… 52.1 5.02
## 9 BI/6… 1971 HK68 AG 3.87 15.9 BILTHOV… 52.1 5.02
## 10 BI/2… 1971 HK68 AG 4.27 14.1 BILTHOV… 52.1 5.02
## # … with 312 more rows
read_csv represented in coord?Compare the message printed by read_csv:
Parsed with column specification:
cols(
name = col_character(),
year = col_double(),
cluster = col_character(),
type = col_character(),
x.coordinate = col_double(),
y.coordinate = col_double(),
location = col_character(),
lat = col_double(),
lng = col_double()
)
to the column specification in coord
## # A tibble: 6 x 9
## name year cluster type x.coordinate y.coordinate location lat lng
## <chr> <dbl> <chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 BI/15… 1968 HK68 AG 4.05 15.0 BILTHOV… 52.1 5.02
## 2 BI/16… 1968 HK68 AG 4.10 14.8 BILTHOV… 52.1 5.02
## 3 BI/16… 1968 HK68 AG 4.36 13.9 BILTHOV… 52.1 5.02
## 4 BI/80… 1969 HK68 AG 3.87 14.3 BILTHOV… 52.1 5.02
## 5 BI/90… 1969 HK68 AG 4.87 14.1 BILTHOV… 52.1 5.02
## 6 BI/17… 1969 HK68 AG 4.40 14.9 BILTHOV… 52.1 5.02
The most common data types in R (base R and tidyverse) are:
int |
integers | 1, 2, 3 |
dbl |
doubles | 1.2, 1.7, 9.0 |
chr |
character | “a”, “b”, “word” |
lgl |
logical | TRUE or FALSE |
fctr |
factors | categorical variables with fixed values |
data.frame## name year cluster type x.coordinate y.coordinate location lat
## 1 BI/15793/68 1968 HK68 AG 4.048064 14.97272 BILTHOVEN 52.14
## 2 BI/16190/68 1968 HK68 AG 4.103302 14.80633 BILTHOVEN 52.14
## 3 BI/16398/68 1968 HK68 AG 4.363448 13.89293 BILTHOVEN 52.14
## 4 BI/808/69 1969 HK68 AG 3.871698 14.25529 BILTHOVEN 52.14
## 5 BI/908/69 1969 HK68 AG 4.868656 14.09319 BILTHOVEN 52.14
## 6 BI/17938/69 1969 HK68 AG 4.400375 14.86012 BILTHOVEN 52.14
## lng
## 1 5.02
## 2 5.02
## 3 5.02
## 4 5.02
## 5 5.02
## 6 5.02
tibble## # A tibble: 6 x 9
## name year cluster type x.coordinate y.coordinate location lat lng
## <chr> <dbl> <chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 BI/15… 1968 HK68 AG 4.05 15.0 BILTHOV… 52.1 5.02
## 2 BI/16… 1968 HK68 AG 4.10 14.8 BILTHOV… 52.1 5.02
## 3 BI/16… 1968 HK68 AG 4.36 13.9 BILTHOV… 52.1 5.02
## 4 BI/80… 1969 HK68 AG 3.87 14.3 BILTHOV… 52.1 5.02
## 5 BI/90… 1969 HK68 AG 4.87 14.1 BILTHOV… 52.1 5.02
## 6 BI/17… 1969 HK68 AG 4.40 14.9 BILTHOV… 52.1 5.02
data.frametibbleVariables
Observations
## # A tibble: 322 x 9
## name year cluster type x.coordinate y.coordinate location lat lng
## <chr> <dbl> <chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 BI/1… 1968 HK68 AG 4.05 15.0 BILTHOV… 52.1 5.02
## 2 BI/1… 1968 HK68 AG 4.10 14.8 BILTHOV… 52.1 5.02
## 3 BI/1… 1968 HK68 AG 4.36 13.9 BILTHOV… 52.1 5.02
## 4 BI/8… 1969 HK68 AG 3.87 14.3 BILTHOV… 52.1 5.02
## 5 BI/9… 1969 HK68 AG 4.87 14.1 BILTHOV… 52.1 5.02
## 6 BI/1… 1969 HK68 AG 4.40 14.9 BILTHOV… 52.1 5.02
## 7 BI/9… 1970 HK68 AG 5.06 14.5 BILTHOV… 52.1 5.02
## 8 BI/2… 1970 HK68 AG 4.82 15.5 BILTHOV… 52.1 5.02
## 9 BI/6… 1971 HK68 AG 3.87 15.9 BILTHOV… 52.1 5.02
## 10 BI/2… 1971 HK68 AG 4.27 14.1 BILTHOV… 52.1 5.02
## # … with 312 more rows
## # A tibble: 322 x 9
## name year cluster type x.coordinate y.coordinate location lat lng
## <chr> <dbl> <chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 BI/1… 1968 HK68 AG 4.05 15.0 BILTHOV… 52.1 5.02
## 2 BI/1… 1968 HK68 AG 4.10 14.8 BILTHOV… 52.1 5.02
## 3 BI/1… 1968 HK68 AG 4.36 13.9 BILTHOV… 52.1 5.02
## 4 BI/8… 1969 HK68 AG 3.87 14.3 BILTHOV… 52.1 5.02
## 5 BI/9… 1969 HK68 AG 4.87 14.1 BILTHOV… 52.1 5.02
## 6 BI/1… 1969 HK68 AG 4.40 14.9 BILTHOV… 52.1 5.02
## 7 BI/9… 1970 HK68 AG 5.06 14.5 BILTHOV… 52.1 5.02
## 8 BI/2… 1970 HK68 AG 4.82 15.5 BILTHOV… 52.1 5.02
## 9 BI/6… 1971 HK68 AG 3.87 15.9 BILTHOV… 52.1 5.02
## 10 BI/2… 1971 HK68 AG 4.27 14.1 BILTHOV… 52.1 5.02
## # … with 312 more rows
name |
name of virus isolate | |
year |
year of isolation | |
cluster |
derived cluster | |
type |
serum or antigen measurement | |
x.coordinate |
x coordinate in antigenic space | |
y.coordinate |
y coordinate in antigenic space | |
location |
location of virus measurement | |
lat |
latitude of location | |
lng |
longitude of location |
ggplot(data = coord). What do you see?What makes this simple plot look very different from the map that we want to achieve?
What other information in our data object coord could we use?
coord could we use?
p + geom_point(aes(x = x.coordinate, y = y.coordinate, color = cluster)) +
scale_color_brewer(type = "qual", palette = "Set3")p + geom_point(aes(x = x.coordinate, y = y.coordinate, color = cluster),
shape = 17) + scale_color_brewer(type = "qual", palette = "Set3")size and shape. Does this convey the same level of information as a color scale?shape scale? Add a shape aesthetic for the variable you identified.size and shape. Does this convey the same level of information as a color scale?## Warning: Using size for a discrete variable is not advised.
size and shape. Does this convey the same level of information as a color scale?## Warning: The shape palette can deal with a maximum of 6 discrete values
## because more than 6 becomes difficult to discriminate; you have
## 11. Consider specifying shapes manually if you must have them.
## Warning: Removed 111 rows containing missing values (geom_point).
shape scale? Add a shape aesthetic for the variable you identified.p + geom_point(aes(x = x.coordinate, y = y.coordinate, color = cluster)) +
scale_color_brewer(type = "qual", palette = "Set1")p + geom_point(aes(x = x.coordinate, y = y.coordinate, color = cluster)) +
scale_color_brewer(type = "qual", palette = "Set1")-> Not enough classes in palette
-> color is inside the aesthetics mapping; if manually setting colors, move outside of aes
p + geom_point(aes(x = x.coordinate, y = y.coordinate, color = cluster)) +
scale_color_brewer(type = "qual", palette = "Set3") + coord_fixed()p + geom_point(aes(x = x.coordinate, y = y.coordinate, color = cluster)) +
scale_color_brewer(type = "qual", palette = "Set3") + labs(x = "Dimension 1 [AU]",
y = "Dimension 2 [AU]", title = "Antigenic cartography",
color = "Cluster") + coord_fixed()p + geom_point(aes(x = x.coordinate, y = y.coordinate, color = cluster)) +
scale_color_brewer(type = "qual", palette = "Set3") + labs(x = "Dimension 1 [AU]",
y = "Dimension 2 [AU]", title = "Antigenic cartography",
color = "Cluster") + coord_fixed() + theme_bw()?coord_ in a new chunk and press tab to see other options.shape aesthetic. Rename the legend title for this aesthetictheme_void(), theme_dark() and theme_classic(). Similar to Exercise 1, you can type ?theme_ and tab to see other possible build in themes.?coord_ in a new chunk and press tab to see other options.shape aesthetic. Add this aesthetic here and rename its legend title.p + geom_point(aes(x = x.coordinate, y = y.coordinate, shape = type,
color = cluster)) + scale_color_brewer(type = "qual", palette = "Set3") +
labs(x = "Dimension 1 [AU]", y = "Dimension 2 [AU]", title = "Antigenic cartography",
color = "Cluster", shape = "Measurement") + coord_fixed() +
theme_bw()theme_void(), theme_dark() and theme_classic(). Similar to Exercise 1, you can type ?theme_ and tab to see other possible build in themes.p + geom_point(aes(x = x.coordinate, y = y.coordinate, shape = type,
color = cluster)) + scale_color_brewer(type = "qual", palette = "Set3") +
labs(x = "Dimension 1 [AU]", y = "Dimension 2 [AU]", title = "Antigenic cartography",
color = "Cluster", shape = "Measurement") + coord_fixed() +
theme_void()p + geom_point(aes(x = x.coordinate, y = y.coordinate, shape = type,
color = cluster)) + scale_color_brewer(type = "qual", palette = "Set3") +
labs(x = "Dimension 1 [AU]", y = "Dimension 2 [AU]", title = "Antigenic cartography",
color = "Cluster", shape = "Measurement") + coord_fixed() +
theme_dark()p + geom_point(aes(x = x.coordinate, y = y.coordinate, shape = type,
color = cluster)) + scale_color_brewer(type = "qual", palette = "Set3") +
labs(x = "Dimension 1 [AU]", y = "Dimension 2 [AU]", title = "Antigenic cartography",
color = "Cluster", shape = "Measurement") + coord_fixed() +
theme_classic()p + geom_point(aes(x = x.coordinate, y = y.coordinate, color = cluster)) +
scale_color_brewer(type = "qual", palette = "Set3") + labs(x = "Dimension 1 [AU]",
y = "Dimension 2 [AU]", title = "Antigenic cartography",
color = "Cluster") + coord_fixed()
+theme_light()As indicated in error message:
Error: Cannot use `+.gg()` with a single argument. Did you accidentally put
+ on a new line?
## Saving 8 x 6 in image
Note: ggsave overwrites the previous file of that name without warning!
-> labels and legend stay legible, make sure to always choose the right text sizes and image sizes in combination
p + geom_bar(aes(x = year, fill = cluster), position = position_dodge(preserve = "single")) +
scale_fill_brewer(type = "qual", palette = "Set3") + theme_bw()p + geom_histogram(aes(x = year, fill = cluster), position = position_dodge(preserve = "single"),
binwidth = 10) + scale_fill_brewer(type = "qual", palette = "Set3") +
theme_bw()p + geom_boxplot(aes(x = type, y = year, color = type)) + geom_jitter(aes(x = type,
y = year, color = type)) + theme_bw()world <- ne_countries(scale = "medium", returnclass = "sf")
g <- ggplot()
g + geom_sf(data = world) + geom_point(data = coord, aes(x = lng,
y = lat, color = cluster)) + scale_color_brewer(type = "qual",
palette = "Set3")position argument of geom_bar. Hint: use the Details paragraph in ?geom_bar to find a description about possible options.preserve="total" in position_dodge of geom_histogram?aes(fill) instead of aes(color)?geom_jitter to geom_point to see why geom_jitter is a better visualisation of the data. Go back to using geom_jitter and play with the width argument to customise your plot.theme for the world map? Add it to the plot.Test different options for the position argument of geom_bar. Hint: use the Details paragraph in ?geom_bar to find a description about possible options.
Details
By default, multiple bars occupying the same x position will be stacked atop one another by position_stack(). If you want them to be dodged side-to-side, use position_dodge() or position_dodge2(). Finally, position_fill() shows relative proportions at each x by stacking the bars and then standardising each bar to have the same height.
p + geom_bar(aes(x = year, fill = cluster), position = position_stack()) +
scale_fill_brewer(type = "qual", palette = "Set3") + theme_bw()preserve="total" in position_dodge of geom_histogram?Hint: In the help function for geom_histogram, click on position_dodge to get to the help for this function. From there, you can see:
preserve Should dodging preserve the total width of all elements at a
position, or the width of a single element?
p + geom_histogram(aes(x = year, fill = cluster), position = position_dodge(preserve = "total"),
binwidth = 10) + scale_fill_brewer(type = "qual", palette = "Set3") +
theme_bw() # Exercises
aes(fill) instead of aes(color)?p + geom_boxplot(aes(x = type, y = year, fill = type)) + geom_jitter(aes(x = type,
y = year, color = type)) + scale_fill_manual(values = c("#66c2a5",
"#fc8d62")) + labs(x = "Measurement", y = "Time", color = "Measurement") +
theme_bw()geom_jitter to geom_point to see why geom_jitter is a better visualisation of the data. Go back to using geom_jitter and play with the width argument to customise your plot.p + geom_boxplot(aes(x = type, y = year, color = type)) + geom_point(aes(x = type,
y = year, color = type)) + theme_bw()theme for the world map? Add it to the plot.world <- ne_countries(scale = "medium", returnclass = "sf")
g <- ggplot()
g + geom_sf(data = world) + geom_point(data = coord, aes(x = lng,
y = lat, color = cluster)) + scale_color_brewer(type = "qual",
palette = "Set3") + theme_void()Fundamentals of Data Visualization at (https://serialmentor.com/dataviz/ Wilke (2019) (with free online version!)
overview of the most appropriate graph for your data type at From data to viz https://www.data-to-viz.com/
Smith, Derek J., Alan S. Lapedes, Jan C. de Jong, Theo M. Bestebroer, Guus F. Rimmelzwaan, Albert D. M. E. Osterhaus, and Ron A. M. Fouchier. 2004. “Mapping the Antigenic and Genetic Evolution of Influenza Virus.” Science 305 (5682). American Association for the Advancement of Science: 371–76. https://doi.org/10.1126/science.1097211.
Wilke, Claus O. 2019. Fundamentals of Data Visualization. 1st ed. O’Reilly Media, Inc. https://serialmentor.com/dataviz/.